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System and Method for Recording and Reproducing Multimedia 
Field of the Invention 

[01] This invention relates generally to processing multimedia, and more 
particularly to recording video signals, audio signals, text, and binary data on 
storage media, and for reproducing selected portions of the multimedia. 

Background of the Invention 

[02] In order to quickly review and analyze a video, for example a movie, a 
recorded sporting event or a news broadcast, a summary of the video can be 
generated. A number of techniques are known for summarizing uncompressed and 
compressed videos. 

[03] The conventional practice is to first segment the video into scenes or 'shots', 
and then to extract low and high level features. The low level features are usually 
based on syntactic characteristics such as color, motion, and audio components, 
while the high level features capture semantic information. 

[04] The features are then classified, and the shots can be further segmented 
according to the classified features. The segments can be converted to short image 
sequences, for example, one or two seconds 'clips' or 'still' frames, and labeled 
and indexed. Thus, the reviewer can quickly scan the summary to select portions of 
the video to playback in detail. Obviously, the problem with such summaries is that 
the playback can only be based on the features and classifications used to generate 
the summary. 
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[05] In order to further assist the review, the segments can be subjectively rank 
ordered according to a relative importance. Thus, important events in the video, 
such as climactic scenes, or goal scoring opportunities can be quickly identified, 
see, Fujiwara et al. "Abstractive Description of Video Using Summary DS," Point- 
illustrated Broadband + Mobile Standard MPEG Textbook, ASCII Corp., p. 177 
Figs. 5-24 February 11, 2003, also "ISO/IEC 15938-5:2002 Information 
technology - Multimedia content description interface - Part 5: Multimedia 
Description Schemes," 2002. After an important video segment has been located, 
the viewer can use fast-forward or fast-reverse capabilities of the playback device 
to view segments of interest, see "DVR-7000 Instruction Manual," Pioneer Co., 
Ltd., p. 49, 2001. 

[06] Another technique for summarizing a news video uses motion activity 
descriptors, see U.S. Patent Application serial number 09/845,009, titled "Method 
for Summarizing a Video Using Motion Descriptors," filed by Divakaran, et al., on 
April 27, 2001. A technique for generating soccer highlights uses a combination of 
video and audio features, see U.S. Patent Application serial number 10/046,790, 
titled "Summarizing Videos Using Motion Activity Descriptors Correlated with 
Audio Features," filed by Cabasson, et al., on January 15, 2002. Audio and video 
features can also be used to generate highlights for news, soccer, baseball and golf 
videos, see U.S. Patent Application serial number 10/374,017, titled "Method and 
System for Extracting Sports Highlights from Audio Signals," filed by Xiong, et 
al., on February 25, 2003. Those techniques extract key segments of notable events 
from the video, such a scoring opportunity or an introduction to a news story. The 
original video is thus represented by an abstract that includes the extracted key 
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segments. The key segments can provide entry points into the original content and 
thus allow flexible and convenient navigation. 

[07] There are a number of problems with prior art video recording, 
summarization and playback. First, the summary is based on some preconceived 
notion of the extracted features, classifications, and importance, instead of those of 
the viewer. Second, if importance levels are used, the importance levels are usually 
quantized to a very small number of levels, for example, five or less. More often, 
only two levels are used, i.e., the interesting segments that are retained, and the rest 
of the video that is discarded. 

[08] In particular, the hierarchical description proposed in the MPEG-7 standard 
is very cumbersome if a fine quantization of the importance is used because the 
number of levels in the hierarchy becomes very large, which in turn requires 
management of too many levels. 

[09] The MPEG-7 description requires editing of the metadata whenever the 
content is edited. For example, if a segment is cut out of the original content, all 
the levels affected by the cut need to be modified. That can get cumbersome 
quickly as the number of editing operations increases. 

[010] The importance levels are highly subjective, and highly context dependent. 
That is, the importance levels for sports videos depend on the particular sports 
genre, and are totally inapplicable to movies and news programs. Further, the 
viewer has no control over the length of the summary to be generated. 
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[Oil] The small number of subjective levels used by the prior art techniques make 
it practically impossible for the viewer to edit and combine several different videos 
based on the summaries to generate a derivate video that reflects the interests of 
the viewer. 

[012] Therefore, there is a need to record and reproduce a video in a manner that 
can be controlled by the viewer. Furthermore, there is a need for specifying 
importance levels that are content independent, and not subjective. In addition, 
there is a need to provide more than a small number of discrete importance levels. 
Lastly, there is a need to enable the viewer to generate a summary of any length, 
depending on a viewer-selected level of importance. 

Summary of the Invention 

[013] A system and method summarizes multimedia stored in a compressed 
multimedia file partitioned into segments. 

[014] An associated metadata file includes index information and importance level 
information for each segment in the sequence. In a preferred embodiment, the files 
are stored on a storage medium such as a DVD. 

[015] The importance information is continuous over a closed interval. An 
importance level threshold, or range, is selected in the closed interval. The 
importance level can be viewer selected. 

[016] When the files are read, only segments of the multimedia having a particular 
importance level greater than the importance level threshold are reproduced. 
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Brief Description of the Drawings 

[017] Figure 1 is a block diagram of a system for reproducing multimedia 
according to the invention; 

[018] Figure 2 is a block diagram of a file structure for multimedia according to 
the invention; 

[019] Figure 3 is a block diagram of a data structure of a metadata file according to 
the invention; 

[020] Figure 4 is block diagram of indexing the multimedia according to the 
invention using the metadata file; 

[021] Figure 5 is a graph representing an abstractive reproduction according to the 
invention; 

[022] Figures 6A is a graph of an alternative abstractive reproduction according to 
the invention; 

[023] Figure 6B is a graphics image representing an abstraction ratio; 

[024] Figures 7 is a block diagram of a system for recording compressed 
multimedia files and metadata files on a storage media according to the invention; 
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[025] Figure 8 is a graph of an alternative abstractive reproduction according to 
the invention; 

[026] Figure 9 is a graph of an alternative abstractive reproduction according to 
the invention; 

[027] Figure 10 is a graph of an alternative abstractive reproduction according to 
the invention; and 

[028] Figure 1 1 is a block diagram of a system for recording multimedia according 
to the invention. 

Detailed Description of the Preferred Embodiment 
[029] Reproducing System Structure 

[030] Figure 1 shows a system 100 for reproducing multimedia, where the content 
of the multimedia is, for example, video signals, audio signals, text, and binary 
data. The system includes a storage media 1, such as a disc or tape, for persistently 
storing multimedia and metadata organized as files in directories. In the preferred 
embodiment, the multimedia is compressed using, e.g., MPEG and AC-3 
standards. The multimedia has been segmented, classified, and indexed using 
known techniques. The indexing can be based on time or frame number, see U.S 
Patent 6,628,892, incorporated herein by reference. 

[031] The metadata includes index and importance information. As an advantage 
of the present invention, and in contrast with the prior art, the importance 
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information is continuous over a closed interval, e.g., [0, 1] or [0, 100]. Therefore, 
the importance level, is not in terms of 'goal' or 'head-line-news-time', but rather a 
real number, e.g., the importance is 0.567 or +73.64. 

[032] As an additional advantage, the continuous importance information is 
context and content independent, and not highly subjective as in the prior art. Both 
of these features enable a viewer to reproduce the multimedia to any desired 
length. 

[033] The metadata can be binary or text, and if necessary, protected by 
encryption. The metadata can include file attributes such as dates, validity codes, : 
file types, etc. The hierarchical file and directory structure for the multimedia and 
metadata are described with respect to Figure 2. 

[034] As shown in Figure 1 , a reader drive 1 0 reads the multimedia and metadata 
files from the storage media 1. A read buffer 1 1 temporarily stores data read by the 
reader drive 10. A demultiplexer 12 acquires, sequentially, multimedia data from 
the read buffer, and separates the multimedia data into a video stream and an audio 
stream. 

[035] A video decoder 13 processes a video signal 17, and an audio decoder 14 
processes the audio signal 18 for an output device, e.g., a television monitor 19. 

[036] A metadata analyzing section 15 acquires sequentially metadata from the 
read buffer 1 1. A reproduction control section 16, including a processor, controls 
the system 100. The functionality of the metadata analyzing section 15 can be 
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implemented with software, and can be incorporated as part of the reproduction 
control section 1 6. 

[037] It should be noted that for any implementation described herein the 
multimedia files and the metadata files do not need to be recorded and reproduced 
concurrently. In fact, the metadata file can be analyzed independently to enable the 
viewer to quickly locate segments of interest in the multimedia files. In addition, 
the multimedia and the metadata can be multiplexed into a single file, and 
demultiplexed when read. 

[038] File and Directory Structure 

[039] Figure 2 shows the hierarchical structure 200 of the files and directories 
stored on the media 1 . A root directory 20 includes a multimedia directory 2 1 and a 
metadata directory 22. The multimedia directory 21 stores information 
management files 23, multimedia files 24, and backup files 25. The metadata 
directory 22 stores metadata files 26. It should be noted that other directory and file 
structures are possible. The data in the multimedia files 24 contains the 
multiplexed video and/or audio signals. 

[040] Note that either the information management files 23 and/or the multimedia 
data files 24 can includes flags indicating the presence or absence or invalidity of 
the metadata. 
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[041] Metadata Structure 

[042] Figure 3 shows the hierarchical structure 300 of the metadata files 26. There 
are five levels A-E in the hierarchy, including metadata 30 at a highest level, 
followed by management information 31, general information 32, shot information 
33, and index and importance information 34. 

[043] The metadata managing information 3 1 at level B includes a comprehensive 
description 3 la of the overall metadata 30, video object (VOB) metadata 
information search pointer entries 31b, and associated VOB information entries 
31c. The associations do not need to be one-to-one, for instance, there can multiple 
pointers 3 lb for one information entry 3 lc, or one information entry for multiple 
VOBs, or none at all. 

[044] At the next level C, each VOB information entry 31c includes metadata 
general information 32a, and video shot map information 32b. The metadata 
general information 32a can includes program names, producer names, 
actor/actress/reporter/player names, an explanation of the content, broadcast date, 
time, and channel, and so forth. The exact correspondences are stored as a table in 
the general information entry 32a. 

[045] At the next level D, for each video shot map information entry 32b there is 
video shot map general information 33a, and one or more video shot entries 33b. 
As above, there does not need to be a one-to-one correspondence between these 
entries. The exact correspondences are stored as a table in the general information 
entry 33a. 
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[046] At the next level E, for each video shot entry 33b, there are start time 
information 34a, end time information 34b, and an importance level 34c. As stated 
above, frame numbers can also index the multimedia. The index information can 
be omitted if the index data can be obtained from the video shot reproducing time 
information 34a. Any ranking system can be used for indicating the relative 
importance. As stated above, the importance level can be continuous and content 
independent. The importance level can be added manually or automatically. 

[047] Multimedia Indexing 

[048] Figure 4 shows the relationship between the multimedia recorded and 
reproduced according to the invention, and the metadata. Program chain 
information 40 stored in the management information file 23 describes a sequence 
for reproducing multimedia of a multimedia data file 24. The chain information 
includes programs 41 based on a reproducing unit as defined by the program chain 
information 40. Cells 42a-b are based on a reproducing unit as defined by the 
program 41. In digital versatile disk (DVD) type of media, a 'cell' is a data 
structure to represent a portion of a video program. 

[049] Video object information 43a-b describes a reference destination of the 
actual video or audio data corresponding to the reproducing time information, i.e., 
presentation time, designated by the cell 42 described in the management 
information file 23. 

[050] Map tables 44a-b are for offsetting the reproducing time information defined 
by the VOB information 43 and converting the same into actual video data or 
audio data address information. Video object units (VOBU) 45a and 45b describe 
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the actual video or audio data in the multimedia data file 24. These data are 
multiplexed in a packet structure, together with the reproducing time information. 
The VOBUs are the smallest units for accessing and reproducing the multimedia. 
A VOBU includes one or more group-of-pictures (GOP) of the content. 

[051] Importance Threshold Based Reproduction 

[052] Figure 5 shows the abstractive reproduction according to the invention, 
where the horizontal axis 5 1 defines time and the vertical axis 50 defines an 
importance level. As shown in Figure 5, the importance level varies continuously 
over a closed interval 55, e.g., [0, 1] or [0, 100]. Also, As shown, the importance 
level threshold 53 can be varied 56 by the viewer over the interval 55. 

[053] The time is in terms of the video-shot start time information 34a and the 
video-shot end time information 34b of Figure 3. The importance is in terms of the 
video-shot importance level 34c. An example importance curve 52 is evaluated 
according to an importance threshold 53. 

[054] During a reproduction of the multimedia, portions of the multimedia that 
have an importance greater than the threshold 53 are reproduced 58 while portions 
that have an importance less than the threshold are skipped 59. The curve 54 
indicates the portions that are included in the reproduction. The reproduction is 
accomplished using the reproducing control section 16 based on the metadata 
information obtained from the metadata analyzing section 15. 

[055] It should be noted that multiple continuous importance levels, or one or 
more importance level ranges can be specified so that only segments having a 
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particular importance according to the real number values in the importance ranges 
are reproduced. Alternatively, only the least important segments can be 
reproduced. 

[056] To reproduce a desired program, the information management file 23 is read 
by the reader drive 10. This allows one to determine that the program is configured 
as, e.g., two cells. 

[057] Each cell is described by a VOB number and index information, e.g., a start 
and end time. The time map table 44a for the VOB1 information 43a is used to 
convert each presentation time to a presentation time stamp (PTS), or address 
information in the VOB1 concerned, thus obtaining an actual VOBU 45. 

[058] Likewise, the cell-2 42b is also obtained with a VOBU 45b group of VOB2 
by the use of a time map table 44b of VOB2 information 43b. In this example, a 
cell, in this case, cell 42b, is indexed by the VOB 43b using the time map table 
44b. 

[059] The data of the VOBUs 45 are provide sequentially for demuliplexing and 
decoding. The video signal 17 and the audio signal 18 are synchronized using the 
presentation time (PTM) and provided to the output device 19. 

[060] When the viewer selects a desired program e.g. program 1 41, the cells 42a- 
b that contain the configuration of the relevant program 41 can be found by the 
program chain information 40. The program chain information is thus used to find 
the corresponding VOB as well as the presentation time (PTM). 
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[061] The metadata 26 described in Figure 4 is used as follows, and as illustrated 
in Figure 3. First, the metadata information management information 3 la is used to 
locate the metadata information search pointer 31b corresponding to the desired 
VOB number. Then, the search pointer 3 lb is used to locate the VOB metadata 
information 31c. The VOB metadata includes video shot map information, which 
in turn includes the start time, stop time and importance level of each of the video 
shots. Thus, the VOB metadata is used to collect all the shots that have a 
presentation time (PTM) included in the range specified by the start time and end 
time of the cell, as well as their corresponding importance levels. Then, only those 
portions that exceed the desired importance level 53 are retained. 

[062] It should be noted that multiple programs can be selected for reproduction, 
and any number of techniques are possible to concatenate only the reproduced 
segments. 

[063] Alternative Abstractive Reproduction 

[064] Figure 6A shows an alternative abstractive reproduction according to the 
invention, where the vertical axis 50 defines an importance level, the horizontal 
axis 51 defines time, and the continuous curve 52 indicates importance levels. Line 
63 is an importance level threshold, and line 64 a reproduction for only those 
segments that have a particular importance greater than the threshold. Other 
segments are skipped. 
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[065] Abstraction Ratio 

[066] Figure 6B shows an abstraction ratio 60. The abstraction ratio can vary, e.g., 
from 0% to 100%, i.e., over the entire interval 55. The abstract ratio is shown as a 
graphics image superposed on an output image on the output device 19, which can 
be a playback device. A portion 61 is a current abstraction ratio that is user 
selectable. The threshold 63 is set according to the user selectable current 
abstraction ratio 61 . The user can set the abstraction ratio using some input device, 
e.g., a keyboard or remote control 17a, see Figure 1. If the abstraction ratio is 
100%, then the entire multimedia file is reproduced, a ratio of 50% only 
reproduces half of the file. The abstraction ratio can be changed during the 
reproduction. It should be noted , that the graphics image can have other forms, for 
example, a sliding bar, or a numerical display in terms of the ratio or actual time. 
Alternatively, the abstraction ratio can be varied automatically by the metadata 
analyzing section 15 or the reproducing control section 16. 

[067] It should be noted, that pointers to the video segments can be sorted in a list 
according to a descending order of importance. Thus, it is possible to obtain a 
summary of any desired length by going down the list in the sorted order, including 
segments until a time length requirement is met. 

[068] Recording System Structure 

[069] Figure 7 shows a block diagram of a system 700 for recording compressed 
multimedia files and metadata files on storage media 2, such as a disc or tape. The 
system includes a video encoder 7 1 and an audio encoder 72, which take as input 
video signals 78, audio signals 79, text, images, binary data, and the like. The 
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outputs of the encoder are multiplexed 73 and stored temporarily in a write buffer 
74 as multimedia data. The outputs are also passed to a metadata generating 
section 75 which also writes output to the write buffer. 

[070] A write drive 70 then writes the multimedia and the metadata to the storage 
media 2 as files under control of a recording control section 76, which includes a 
processor. The files can be written in a compressed format using standard 
multimedia compression techniques such as MPEG and AC-3. Encryption can also 
be used during the recording. It should be noted that the metadata generating 
section 75 can be implemented as software incorporated in recording control 
section 76. 

[071] The encoders extract features from the input signals 78-79, e.g., motion 
vectors, a color histograms, audio frequencies, characteristics, and volumes, and 
speech related information. The extracted features are analyzed by the metadata 
generating section 75 to determine segments and their associated index information 
and importance levels. 

[072] It should be noted that, for any implementation, the multimedia files and the 
metadata files do not need to be generated concurrently. For example, the metadata 
can be generated at later time, and metadata can be added incrementally over time. 

[073] Time Threshold Based Reproduction 

[074] Figure 8 shows an alternative reproduction according to the invention in an 
abstract manner where the vertical axis 50 defines an importance level, the 
horizontal axis 5 1 defines time, and the continuous curve 52 indicates importance 
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levels over time. Line 80 is a variable importance level threshold, and line 8 1 a 
reproduction for only those segments that have a particular importance greater than 
the threshold. Other segments are skipped. 

[075] However, in this embodiment, a time threshold is also used. Only segments 
that have a particular importance level greater than the importance level threshold 
and maintain that importance level for an amount of time that is longer than the 
time threshold are reproduced. For example, the segment al to a2 is not 
reproduced, while the segment bl to b2 is reproduced. This eliminates segments 
that are too short in time to enable the viewer to adequately comprehend the 
segment. 

[076J Time Threshold Based Reproduction with Additive Segment Extension 

[077] Figure 9 shows an alternative reproduction 900 according to the invention in 
an abstract manner where the vertical axis 50 defines an importance level, the 
horizontal axis 5 1 defines time, and the curve 52 indicates importance levels over 
time. Line 90 is an importance level threshold, and line 91 a reproduction for only 
those segments that have a particular importance greater than the threshold. Other 
segments are skipped, as before. In this implementation, as well as alternative 
implementations described below, the amount of extension can vary depending on 
the decisions made by the reproduction control section. 

[078] This embodiment also uses the time threshold as described above. However, 
in this case, segments that are shorter in time than the time threshold are not 
skipped. Instead, such segments are time extend to satisfy the time threshold 
requirement. This is done by adding portions of the multimedia file before, after, or 
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before and after, the short segments, for example, segment cl to a2. Thus, the short 
segments are increase in size to enable the viewer to adequately comprehend the 
short segment. It should be noted, that a second time threshold can also be used, so 
that extremely short segments, e.g., single frames, are still skipped. 

[079] Time Threshold Based Reproduction with Multiplicative Segment 
Extension 

[080] Figure 10 shows an alternative reproduction according to the invention in an 
abstract manner where the vertical axis 50 defines an importance level, the 
horizontal axis 5 1 defines time, and the curve 52 indicates importance levels over 
time. Line 1000 is an importance level threshold, and line 101a reproduction for 
only those segments that have a particular importance greater than the threshold. 
Other segments are skipped. 

[081] This embodiment also uses the time threshold as described above. However, 
in this case, the time of the segments are increased by a predetermined amount d to 
increase the size of the reproduced segments that satisfy the time threshold. As 
above, the segments can be extended before, after, or before and after. We can also 
use a multiplication factor to achieve the same lengthening of the time of the 
segments. 

[082] Recording and Reproducing System Structure 

[083] Figure 1 1 shows a block diagram of a system 1 100 for recording and 
reproducing compressed multimedia files and metadata files stored on read/write 
storage media 3, such as a disc or tape. 
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[084] A read/write drive 1 10 can write data to the read buffer 1 1 and read data 
from the write buffer 74. The demultiplexer 12 acquires, sequentially, multimedia 
from the read buffer, and separates the multimedia into a video stream and an 
audio stream. The video decoder 13 processes the video stream, and the audio 
decoder 14 processes the audio stream. However, in this case, the metadata 
generating section 75 also receives the outputs of the decoders 13-14 so that the 
reproduced multimedia can be persistently stored on the storage media 3 using a 
recording/reproducing control section 111. 

[085] It should be noted that the importance level, indexing information and other 
metadata can also be extracted from the video and/or audio data during the 
decoding phase using the metadata generating section 75 . 

[086] Furthermore, the importance level, indexing information and other metadata 
can also be generated manually and inserted at a later stage. 

[087] It should be noted that any of the above implementations can include a 
search function, to enable the viewer to directly position to particular portion of the 
multimedia based either on time, frame number, or importance. The search 
function can use 'thumbnail' segments, for example a single or small number of 
frames to assist the viewer during the searching. 

[088] Although the invention has been described by way of examples of preferred 
embodiments, it is to be understood that various other adaptations and 
modifications may be made within the spirit and scope of the invention. Therefore, 
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it is the object of the appended claims to cover all such variations and 
modifications as come within the true spirit and scope of the invention. 
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